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O Abstract 

The likelihood-free sequential Approximate Bayesian Computation (ABC) algorithms, are 
^ increasingly popular inference tools for complex biological models. Such algorithms proceed 

by constructing a succession of probability distributions over the parameter space conditional 
upon the simulated data lying in an e-ball around the observed data, for decreasing values of 
the threshold e. While in theory, the distributions (starting from a suitably defined prior) will 
converge towards the unknown posterior as e tends to zero, the exact sequence of thresholds can 
impact upon the computational efficiency and success of a particular application. In particular, 
we show here that the current preferred method of choosing thresholds as a pre-determined 
C^l quantile of the distances between simulated and observed data from the previous population, 

can lead to the inferred posterior distribution being very different to the true posterior. Thresh- 
old selection thus remains an important challenge. Here we propose an automated and adaptive 
method that allows us to balance the need to minimise the threshold with computational effi- 
ciency. Moreover, our method which centres around predicting the threshold - acceptance rate 
. . curve using the unscented transform, enables us to avoid local minima - a problem that has 

^ plagued previous threshold schemes. 

c3 1 Introduction 

Mathematical models have become powerful tools for both summarising our current biologi- 
cal understanding, and generating novel hypotheses. However, as our models become more 
ambitious in size and complexity, the computational challenges of a number of tasks such as 
parameter inference and model validation are increasingly vast. For large, complex or stochastic 
models, exploring the likelihood surface can be too complicated or numerically too demanding, 
even though it is possible to simulate the model. For this reason, likelihood-free methods such 
as Approximate Bayesian Computation (ABC), and its more efficient sequential versions, are 
becoming increasingly important. 
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Sequential ABC algorithms proceed by constructing a succession of probability distributions 
over the parameter space conditional on the simulated data lying in an e-ball around the ob- 
served data and use decreasing values of the threshold e to incrementally approximate the true 
posterior distribution. While in theory, the distributions (starting from a suitably defined prior) 
will converge towards the unknown posterior as e tends to zero, in practice the exact sequence 
of thresholds can have a great impact on the computational efficiency and success of a par- 
ticular application. Currently, thresholds are typically chosen as a pre-determined quantile of 
the distances between simulated and observed data from the previous population, or simply by 
intuition - the drawbacks of which are made clear in the results below. 

Here we present an automated and adaptive method for threshold choice that is based upon 
predictions of the future distributions using the unscented transform (UT). Generally known 
for its use in extending the Kalman filter to non-linear problems, the UT allows the statistics 
of a Gaussian random variable that has undergone a non-linear transform to be estimated. In 
combination with Gaussian mixtures, the UT can be used to predict the ABC acceptance rate for 
any threshold value, and subsequently choose the acceptance thresholds that optimally balances 
the need to minimise e with computational efficiency. Further, knowledge of the complete 
threshold - acceptance rate curve can enable us to avoid local optima — a problem that, as 
is often but perhaps not always acknowledged, has plagued previous threshold schemes. We 
will show below that this problem is particularly pertinent for schemes that choose threshold 
schedules from quant iles of the previous population of accepted particles. 

The remainder of the paper is organised as follows: we first introduce ABC SMC and use 
a toy model to discuss the challenges of threshold selection, and in particular, the difficulty of 
avoiding local optima. We then describe the proposed threshold selection scheme for inference 
on deterministic models (with Gaussian measurement error); although within an ABC filtering 
framework, a natural extension allows applications also to stochastic state-space models. Finally 
we compare the performance of the new adaptive method with various fixed quantile schedules 
for inference on both toy and biological systems, including two biochemical oscillators. 



2 Adaptive Sequential Monte Carlo methods in Approxi- 
mative Bayesian Computation 

The aim of ABC is to obtain a good and computationally affordable approximation to the 
posterior distribution 

p{e\x*) ^ f{x*\0)7r{0), 

where 7r{0) denotes the prior distribution over the parameter space and f{x*\0) is the likelihood 
of the observed data for a given parameter, 0. Rather than evaluating the likelihood directly, 
which for many real-world problems can be intractable, ABC-based approaches use systematic 
comparisons between real and simulated data. The main principle consists of comparing the 
simulated data, x, with the real data, x*, and accepting simulations if a suitable distance measure 
between them, A(x,x*), is less than a specified threshold, e. The ABC algorithm thus provides 
a sample from the approximate posterior of the form, 

p((9|x*) ^ oc J f{x\0) 1 (A(x, X*) < e) 7r{0)dx . 

The simple ABC scheme outlined above suffers from the same shortcomings as other rejection 
samplers: most of the samples are drawn from regions of parameter space which cannot give 
rise to simulation outputs that resemble the data. Over the past few years many improvements 
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to these algorithms have been proposed that raakes ABC inference more efficient: regression- 
adjusted ABC [l]-[3j, Markov chain Monte Carlo ABC schemes [T,^, and ABC implementing 
variants of sequential importance sampling (SIS) or sequential Monte Carlo (SMC) |6-10 . We 
will focus here on the last form, which appears to emerge as the most popular framework for 
complex inferential problems 11 . 

ABC methods based on SIS or SMC samplers aim to sample from a sequence of distri- 
butions, which increasingly resemble the target posterior; they are constructed by estimating 
intermediate distributions Pe^{0\x) for a decreasing sequence of {€t}i<t<T- In this article, we 
focus on the implementation of Toni et al 7 and Beaumont et al 8 described in Algorithm [l] 
This implementation that we will call in the following ABC SMC differs from the ABC SMC 
algorithm of Del Moral et al 9 and Drovandi et al flO' in a number of points, and proceeds 
as follows: the first population of particles is constructed using the rejection ABC algorithm 
described above with a sufficiently large value of ei such that many particles are accepted: the 
parameters are drawn from the prior distribution 7r(^), and are accepted only if the distance 
between the simulated and observed data is smaller than ei. We denote by {^*^*'^^}i<i<Ar the 
set of accepted particles at step and by {x*^*'^^}i<i<Ar the corresponding simulated data. Each 
particle has an associated weight cj^*'^^; in the ffist population all weights are equal to 1/N. 
For each intermediate population t, a parameter {^^*'^~^^}i<i<Ar is sampled from the previous 
population, t — 1, with probability defined by the weights, {co'^*'^~^^}i<i<Ar, and perturbed using 
a perturbation kernel^ ~ Kt{-\0); the parameter 6 is then accepted if and only if the distance 
between the simulated and the observed data is smaller than e^. These sample, perturbation, 
simulation and acceptation/rejection steps are repeated until N particles have been accepted. 
The weight of each particle is then computed as 



E7^,a;0-'^-i)i^,(^(^'^)|^0-'^-i)) ' 

The efficiency of the sequential ABC algorithm described above strongly relies on the choice 
of the perturbation kernel {Kt{-\-)}t as well as the sequence of thresholds {e^j^. Over the 
past years adaptive methods to choose perturbation kernels have gained popularity. Beaumont 
et al 8 ffist suggested to use a componentwise-normal perturbation kernel with an adaptive 
choice for the variances. Filippi et al 12^ then generalized this approach to a multivariate normal 
perturbation kernel and compared the efficiency of the ABC SMC algorithm for a selection of 
adaptive covariance matrices. 

Until recently, there has been no systematic way of determining the threshold sequence. The 
ideal threshold scheme is the one that minimizes the total number of simulations since this is 
typically the most computationally expensive part of any ABC algorithm. This requires a careful 
balance between a small number of populations i.e. a rapidly decreasing sequence of thresholds, 
and a high acceptance rate per round which generally happens if the difference between two 
consecutive thresholds is small enough. In the following, we denote by the acceptance rate 
for the round t which is equal to ratio of the population size, and the number of times the 
model has been simulated during the t-th round. Perhaps the most commonly used adaptive 
scheme for threshold choice is based on the quantile of the empirical distribution of the distances 
between the simulated data from the previous population, and the observed data (see [8,,13j and 
in a different way [9|[To]). The method determines et at the beginning of the t-th round by 
sorting the distances {A(x'^*'^~^\ x*)}i<i<Ar and setting et such that a percent of the simulated 
data {x*^*'*~^^}i<i<Ar are below it, for some predetermined a. 

A severe drawback of this quantile approach for threshold selection is that the final ABC 
posterior distribution p^^ {0\x^) may end up being very different to the true posterior p{0\x*). In 
particular, if particles are sampled from a large region of parameter space that offers negligible 
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Figure 1 : Global and local minima, (a) Plot of the distance between simulated and true data for different parameter 
values. The model was designed to produce a severe global optimum at the true parameter value ^ = 3, and a broad 
local optimum associated with distances no smaller than 50. The red dots arranged in horizontal lines are the 
members of successive ABC accepted populations. After the sixth population, parameters near the global optimum 
are no longer sampled, (b) Plots of the failure rates (grey), and total number of simulations (red) for a range of fixed 
quantile schedules. The red dots indicate the means over 100 ABC SMC runs, while the upper and lower whiskers 
are the maximum and minimum values respectively. 

or little support for the posterior distribution, there is a risk of getting stuck in this parameter 
region if the threshold is selected using a quantile method. As an illustration we consider a toy 
model where for each 0, the simulated data is x = g{0) = {6 — 10)^ — 100 exp(— 100(^ — 3)^). 
Moreover, we suppose that the true data are generated using the parameter ^* = 3. The support 
of the posterior distribution should then contain this parameter value. 

Figure [l] represents the LI distance between the simulated data g{0) and ^(3) as a function 
of 0. In this example, for all the parameter space except in the interval (2.92, 3.08) the distances 
are larger than 50. ABC SMC is used on this example, selecting the threshold sequences as 
the 0.8 quantile of the previous population's distances. The prior distribution is Gaussian with 
mean 10 and variance 10. Particles from successive populations are represented by the red dots 
in Figure [l] with population t aligned along the horizontal line y = dt where dt is the maximum 
distance between ^(^^*'^^) and ^(3) for all 1 < i < A/". For example, max^ \g{0^'^'^'^) — g{3)\ being 
equal to 150, the first population is represented by dots on the line corresponding to y = 150. We 
note that for all t the distributions Pet(^k*) are centred around 10 and that the true parameter 
3 has a very low probability under the final distribution, where the value of the threshold has 
converged. Repeating this inference for different values of a, leads to very different results 
(figure [T]). The optimal (or at least safe) choice of a thus depends on the data, the model and 
the prior range. We feel that this problems highlights the potential issues arising in real- world 
applications. 

The fundamental idea underlying our approach is to predict reliably and cheaply how the 
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acceptance rate depends on the value of the threshold e. For this example, the key is to avoid 
areas where the acceptance rate is excessively high. Too high an acceptance rate means to 
overly reward particles that are similar to the ones from the previous population; this in turn 
will lead to particle populations tending to drift in parameter space, whence broad but shallow 
local optima are explored more frequently than they ought to be compared to smaller regions 
that have higher posterior probability. 

Below we present an automated and adaptive method for threshold choice based upon pre- 
dictions of future distributions. We show that by predicting acceptance rates over all thresh- 
old values, we are able to detect possible local minima and choose thresholds so that they are 
avoided. Furthermore, global knowledge of these acceptance rates allows us to balance threshold 
reduction and computational cost at each round of the ABC SMC algorithm. We illustrate the 
method with applications to toy and biological dynamical systems, comparing the performance 
of our method with various quantile selection schemes. 



3 The threshold - acceptance rate curve 

The proposed method centres around understanding the acceptance rate Ht(e) as a function of 
the threshold e for the next round of ABC simulations. We ask, if the threshold - acceptance 
rate curve were known, how should e be chosen in order to optimally balance computational 
efficiency with the need to minimise e, and to avoid getting stuck in regions of parameter space 
that share little support with the posterior distribution? 

Figure |2] shows the threshold - acceptance rate curves for a variety of models. Although 
the structure of the proposal distribution and likelihood surface could give rise to anything 
monotonic increasing, we tend to encounter three main types or curve (shown in figure |2|; 
concave, convex and sigmoidal. Further, for their interpretation it helps to consider each curve 
as a combination of convex and concave parts. A concave shape occurs over a particular range 
of threshold values, when the majority of particles drawn from the perturbed distribution (with 
distances relevant for this range) give rise to simulated data that is relatively close to the 
observed data. The opposite is true for convex shapes. 

One way the latter can happen is when the perturbed distribution spans a region of parameter 
space that includes both a sharp global maximum and a broader local maximum of the likelihood. 
To see this, one can imagine an e-ball expanding about the true data; at first the ball only 
encompasses a small number of particles that were drawn from very close to the global maximum, 
corresponding to the low gradient at the foot of the shape. Once e is large enough we are able 
to accept the relatively large number of particles sitting in the local maximum, which causes the 
increase in gradient. This is exactly the scenario described in the toy example above where the 
ABC SMC algorithm is seen to fail when using various quantile strategies to select the threshold 
values (figure [T]) , and is likely to be a common occurrence in biological systems where likelihood 



surfaces are known to be highly complex 14 151. Indeed below we present such an example 



involving the smallest possible biochemical system that can exhibit an oscillation inducing Hopf 



bifurcation 16,17 



This interpretation of convex shapes as a symptom of sampling from local optima in the 
posterior suggests the following criterion for threshold selection: the ABC SMC algorithm can 
get stuck when a threshold schedule allows particles from a relatively broad local optimum to 
be accepted too frequently and for successive populations — at each round of the algorithm we 
risk not sampling from the correct posterior (i.e. that part of the posterior that account for 
"almost all" of the probability mass of the posterior) at all; diffusion of the parameter particles 
to the incorrect area in parameter space is thus entropically driven. This can be avoided by 
choosing a threshold that rejects these particles with high probability. From the threshold - 
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Algorithm 1 ABC SMC algorithm 



1: input: a decreasing sequence of thresholds, (et)i<t<T such that = e, a data a sequence 

of {Kt{\))l<t<T 

2: output: a weighted sample of particles from p^rj.{9\x) 
for all 1 < t < T do 

determine the perturbation kernel and the next threshold et 

i ^ 1 
repeat 

if t=l then 

sample 6 from 7t{6) 
else 

sample 6 from the previous population {9^^'^~^^i<i<N with weights {uj^'^'^~^'^}i<i<N 
sample from Kt{'\0) and such that 7r{0) > 
end if 

sample y from f{'\0) 
if A(?/, x) < et then 

z 4- i + 1 
end if 
until i = N + l 

calculate the weights: for all 1 < i < A/" 
if t 7^ 1 then 



else cc;(^'-^) ^ 1 
end if 

normalize the weights 
end for 
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Fi gure 2: (a) Epsilon— acceptance rate curves for different models (clockwise from top left); a Gaussian p.d.f, a 
quadratic function, the Repressilator, and the smallest biochemical system exhibiting a Hopf bifurcation. Solid lines 
indicate mean values of 100 repeat predictions, with dotted lines the minimum and maximum predicted values. Red 
dots indicate the actual acceptance rate of an ABC run with corresponding threshold values, (b) Typical threshold 
- acceptance rate curve shapes. Red dots indicate the threshold values considered by our method - either at the 
extreme of a convex section to avoid a possible local optimum, or the point argmin^ A((^^^, ^^(if^T))' -'-)) ^^^^ 
balances reduction of e with computational expense. In the case where multiple threshold values satisfy the latter 
condition, we select the smallest one. 

acceptance rate curve we can identify such a threshold value as one that lies at the bottom of the 
steep incline, i.e. argmax^ For concave shapes the same danger is not apparent from the 

curve, and we can instead try to balance computational expense against the desired reduction 
in e. Here we treat the threshold - acceptance rate curve in the same spirit as we would treat a 
ROC curve, identifying the optimal "cut-point" as that which minimises the distance between 
(— v>^/^^^ N ) and (0, 1). Similar thresholds can be defined when the relative tradeoffs between 
the need to reduce the threshold and computational expense are weighed differently. 

We can now state our proposed threshold selection method given the threshold - acceptance 
rate curve: 

1. If t > 0, define dmin to be the minimum distance produced by past simulations. 

2. Define e* = argmax^ ^^^^ - 

3. If Ht(e*) > or e* > dmin (for the case t > 0), set = e*. (Detection of a possible local 
minimum) 

4. If Ht(e^) < S and e* < dmin^ choose Ct = argmin^ A((^^, )? (0? !))• (Reduction of 
threshold v.s. Computational expense) 

Of course, the shape of the threshold - acceptance rate curve is in general unknown but below we 
show that it is often possible to obtain useful predictions of this curve which allows us to "guess" 
near-optimal thresholds. Before we discuss this approach in detail we illustrate its considerable 
advantages in our toy model. 

3.1 Toy model redux 

We now revisit the toy model introduced earlier, and illustrated in figure 1. Recall that the 
existence of a broad local optimum and narrow global optimum, was allowing the ABC SMC 
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Fi gure 3: Application of our method to the toy model with a local optimum, (a) The threshold - acceptance rate 
curve for the toy model, with solid and dotted lines indicating the mean, maximum and minimum predicted values 
over 100 runs of our algorithm. The red dot indicates the value e* = argmax^ ^ de'^^^ ' Starting from a population 
spread across both the global and local minimum, a threshold value of approximately 50 rejects all parameters except 
those situated in a small interval about the true parameter value of 3. Successive populations refine the distribution 
about this value. 

inference to converge to a distribution that shared no support with the true posterior. In figure 
lb, we examine how likely this is to occur for different fixed quantile schedules, and find that the 
failure rate increases with the quantile value - for quantiles of 0.3 and higher, the failure rate 
is greater than 80%. This makes sense, as higher quantile values allow particles from the local 
optimum region to be accepted for more ABC runs. For each of these runs it is unnecessary to 
sample from the global optimum in order to reach N accepted particles, and so an opportunity 
exists to either miss it entirely, or to sample it sparsely enough that it is lost during perturbation. 
We also find that the number of simulations needed for convergence of the sequence of epsilons, 
reduces as the quantile increases - that is, the computational expense of failure is lower than 
success. 

The epsilon - acceptance rate curve for this model is shown in Figure |3^. The shape is 
sigmoidal with the position of the lower "elbow" at e ~ 50, which correctly predicts the minimal 
distance obtainable from parameters in the local optimum region. Setting epsilon to this value 
will cause nearly all particles from the local optimum to be rejected, and force particles closer 
to the true posterior to be sampled. Indeed, application of our method allows ABC SMC to 
converge to a distribution about the true parameter value every time for this example. One 
such inference is shown in figure |3]3, where the second population is already restricted to a small 
interval about the true parameter value. While the fixed quantile methods can be similarly 
successful e.g. for 0.05, in practice this occurs only with the good fortune of choosing a quantile 
that selects a threshold appropriately with respect to some unknown critical value (here 50) 
sufficiently quickly. 
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4 Estimating the threshold - acceptance rate curve 



The threshold selection scheme described above relies upon knowledge of the threshold - accep- 
tance rate curve. In this section we suggest a computationally inexpensive way in which it can 
be approximated. 

We first define formally the acceptance rate of the algorithm for round t > and any 
threshold, e by, 

M^) := J Pt{x)l (A(x,x*) < e)dy (1) 

where pt{x) is the distribution of the simulated data corresponding to parameters sampled from 
the last population and perturbed by the kernel i^t, i.e. 



pt{x) = j qt{0)f{x\0)d0 
with the perturbed distribution, qt{0)^ defined via 

N 



A simple way to estimate would be via Monte Carlo approximation, i.e. simulating data 

from a large sample drawn from qt{0). However, the expense of such a naive approach is generally 
prohibitive. Here we use the so-called unscented transform to approximate the distribution pt{x) 
of the model output given the distribution qtiO). The unscented transform (UT) 18 , tells us 
how the moments of a random variable, ^, are transformed by a non-linear function, g. Its 
computational efficiency and fiexibility to the form of the non-linear function, has led to its 



extensive use in filtering 19 and smoothing 20 algorithms, and its increasing popularity as 



a tool for parameter inference 21 22 and uncertainty propagation [23]. From here we will 
model our data as a non-linear transformation, ^, of the parameter with an additive zero- 
mean noise term. However, the method can be extended to stochastic state space models, with 
some limitations on the form of the observation model, or without these limitations in an ABC 
filtering framework. 

The unscented transform requires that the perturbed distribution, qt{0)^ is decomposed into 
a mixture of Gaussians, 

qt{e) ^Y^am{0) 

with each pi being a Gaussian density that can be fit by an EM algorithm. In general, the greater 
the number of components the more accurate the acceptance rate approximations become — 
we are not only trying to fit the input distributions but allow enough fiexibility to approximate 
a possibly complex, multi- modal output. Indeed we find that increasing degrees of non-linearity 
in g requires more Gaussian components in order to keep the accuracy at the same level (see 
figure [4| . 

For each component of the mixture, we use the UT to approximate, 

f{x\e)p,{e)de, 

as follows. The first step in the UT algorithm is to determine a set of weighted particles (called 
sigma-points) with the same sample moments up to a desired order as the distribution Pi{9). 
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Here we use a scaled sigma-point set {Xk}k=o,...2L that captures both means and covariances 24 , 
Xo l^e 



Xk l^e 
Xk l^e 



k = 

k = L + 1,...,2L 



where L is the dimension of 0^ jig and T^e are the mean and covariance of 9 ~ Pi{')^ [^k 
represents the kih column of a matrix A, and 

The sigma-point weights i^^}/c=o,...2L are given by, 

^0 - L+A 



1 



2(L+A) 



1,...,2L. 



and finally, the parameters a and /3 may be chosen to control the positive definiteness of 
covariance matrices, spread of the sigma-points, and error in the kurtosis respectively. While 



sigma-point selection schemes exist for higher moments 25 , 26 , they come with significantly 
increased computational cost. 

Once determined, each sigma-point is propagated individually through the function, ^, and 
the mean and covariance of the transformed variable, g{0)^ can be estimated using the update 
equations, 

2L 



k=0 
2L 

^9(0) - ^kidiXk) - ^g{e)){g{Xk) - ^g{e))^' (3) 

/c=0 

We denote the resulting approximate probability density function for g{0)^ for mixture compo- 
nent i by Up.{x). 

By matching terms in the Taylor expansions of the estimated and true values of these mo- 
ments, it can be shown that the above algorithm is accurate to second order in the expansion. 
More generally, if the sigma-point set approximates the moments of up to the n^^ order then 
the estimates of the mean and covariance of g{0) will be accurate up to the n^^ term 18 . Cru- 
cially, the number of points required (2L + 1 for this scheme) is much smaller than the number 
required to reach convergence with Monte- Carlo methods. 

Given the Up.{x)^ for each mixture component, we can approximate the distribution of the 
output X as follows, 

pt{x) « Z"*/ fix\^)Pi{^)d^ (4) 

i 

- (5) 

i 

Samples {xj}j=o,...,M from the mixture of Gaussians distribution in equation ([5| may then be 
used as an inexpensive proxy for ABC simulations, and the acceptance rates can be estimated 
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as, 

M 

where, 

H,{A{xj,x*)) = ,x (7) 



is used as a smooth approximation to the "accept and reject" indicator function, with k control- 
hng the severity of the step. The smooth approximation is necessary for estimating the critical 
value. 



of the proposed threshold selection scheme. 

In summary, the ABC SMC acceptance rate may be approximated for any threshold value 
at the beginning of each round t > of the algorithm using the steps: 

1. generate a population of perturbed particles, sampling from {^*^*'^~^\ cj*^*'^~^^}i<^<Ar and 
perturbing each particle independently with Kt^ 

2. fit a Gaussian mixture model to the perturbed population, 

3. estimate using the unscented transform independently for each component of the 
Gaussian mixture, 

4. estimate acceptance rates for different threshold values according to equation (|6| sampling 
from Y.i ^pA^)' 



5 Applications to biological models 

We now contrast our adaptive method to various fixed quantile threshold schedules in the 
context of two biological dynamical systems. We consider two criteria: firstly, the total number 
of simulations required to reach a pre-chosen threshold value; and secondly, the proportion of 
repeat runs that fail, i.e. get stuck in a local minimum, or fail to reach the threshold in a given 
(very long) period of time. 

5.1 Computational expense: Quantiles vs UT 

The repressilator has become a classic example of a synthetic biological oscillator 27 . It consists 
of six species (three mRNAs, (m^), and their protein products, (pi)), with regulatory links 
between them forming a single feedback loop — each protein inhibits the production of the next 
protein's mRNA. The dynamics of the species concentrations are governed by the first order 
differential equations. 
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Figure 4: Prediction errors. Plot showing the mean squared error in predicting acceptance rates over 10 values of 
e for different numbers of mixture components. Results for models of varying degrees of non-linearity are shown in 
different colours. 



dm I 

~dt~ ~ 

dpi _ 

dt ~ 
dm 2 

"dT ~ 

dp2 _ 
dt ~ 
dms 

~dt~ ~ 
dps ^ 
dt ~ 

where 6 = (n, /3,a,ao) is the parameter vector to be inferred. We set the initial species con- 
centrations to (mi,pi, m2,P2, ^3,^3) = (0.0,2.0,0.0,1.0,3.0), and generate some data by simu- 
lating the model with 6 = (2.0,4.0, 1000.0, 1.0), and "observing" the state of pi at time-points 
(4.0, 8.0, 20.0), subject to some small added zero-mean Gaussian noise with covariance 0.01/. 
With Gaussian prior distributions that encompass the true parameter values, we perform ABC 
SMC, choosing thresholds according to our method and a range of fixed quantile threshold 
schedules. The inferences are repeated 10 times for each method, and stopped once a round of 
ABC has been completed with a threshold value below a pre-chosen challenging threshold - in 
this case, 35. 

A comparison of the performance of each method is shown in figure [5j The computational 
expense of the quantile methods is found to vary significantly and in non-linear fashion, with the 
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best performing quantile (0.3) over three times cheaper than the worst (0.9). This highhghts the 
difficulty of choosing fixed threshold values that perform well. Our method scores similarly to the 
best fixed schedules (at approximately 4000 simulations), which suggests that it is successfully 
reducing the computational cost for this inference. 
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Figure 5: Box plots showing the total number of simulations required to reach a threshold value of 35 (with the L2 
distance function), for the Repressilator model and different threshold schedules. An order of magnitude difference 
exists in the computational expense of the best and worst performing quantile methods. Our adaptive method 
performs comparably to the best fixed quantile schedules. 



5.2 Oscillations and local minima 

The model below represents the simplest biochemical reaction system that permits a Hopf 



bifurcation 16 17 . It can be shown that this system, described by, 

^ = {Aki — k/^)x — k2xy 

dy _ _ 
— - -ksy + k^z 
at 

—— — k^x — k^z , 
at 

where, x, y^ represent the concentrations of three reactants, /c^, are the reaction rates, and, 
A, is the fixed concentration of a fourth reactant, displays a limit cycle for Aki = ks -\- k^ -\- k^. 
Further, when the true value of Aki is greater than the critical value, /C3 + /C4 + /C5, the bifurcation 
has an effect on the likelihood of producing a global maximum and broader local maximum, 
with the regions becoming more defined for larger data sets 17 . This is illustrated in the legend 
of figure [6j where the shapes of the log-likelihood (with respect to Aki) are shown for data sets 
of size T = (100, 200, 500), with fixed ki = 1 and the true value of Aki set at 5.5. Values of 
Aki below the bifurcation point {Aki = 3) are seen to have log-likelihood values which, in the 
case of larger T, are greater than those for values of Aki above the bifurcation point that do 
not belong to a small interval about the true value. 
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We repeat ABC SMC inferences 10 times for each of the data sets and for each threshold 
choice scheme. The total number of simulations needed to reach a target threshold value of 
a/sot (scaled according to the size of the data set), is recorded unless this grows above 100,000, 
in which case the inference is considered to have failed. By varying N we are able to examine 
the adaptability of our threshold choice method to different likelihood shapes and, moreover, 
perform a "stress test" by controlling how challenging it is to avoid the local maximum. 

Results comparing our strategy to various fixed quantile schedules are shown in figure [6j 
We find that in all cases, the expense of our method is comparable or cheaper than the best 
performing fixed quantile schedule, and further that the variability in cost between different 
data sets is smallest for our method. Moreover, our method is successful in all cases, while the 
fixed quantile schedules (with the exception of 0.3) suffer failures for T > 300; in one case (for 
a = 0.9) this happens for every repetition. For the larger quantiles, these failures are caused 
by the accepted population becoming trapped in the interval (0,3), while for the 0.01 quantile, 
the reduction in threshold can be too severe, which leads to a very low acceptance rate and 
computationally overly expensive inference. These observations suggest that our strategy is 
successfully adapting to the different likelihood shapes; thresholds are chosen that balance the 
need to avoid the local optima with minimising computational expense. 
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Figure 6: Plots showing the performance of our adaptive method and different fixed quantile schedules for parameter 
inference on the Hopf bifurcating system. For each data set and method, 10 inferences were performed. As shown 
in the legend, colours indicate the number of data points used for the inference, and crosses mark failures of some 
inferences to reach a fixed threshold value (scaled according to the number of points used), within 100000 simulations; 
the percentage failure is also shown. The legend also shows the shapes of the likelihood surface for each dataset, with 
the peak around the true parameter value narrowing as the set size increases. Note that the computational expense 
varies least across datasets for our method, which also suffers no failures. In comparison, the 0.9 quantile schedule 
always fails when using 500 data points. 



6 Discussion 

Knowing the relationship between the acceptance rate and the e threshold schedule applied in 
ABC SMC has obvious implications for the efficiency and computational affordability of ABC 
inference in complex inference tasks; obviously, ABC schemes should really only be used in 
cases where conventional likelihood-based inferences fail. Clearly, however, this relationship is 
unknown but as we have shown naive reliance on threshold schedules determined adaptively from 
quantiles of previous populations are fraught with a number of problems. The most commonly 
encountered problem is that gentle reductions in thresholds, e^, between populations have a 
tendency to lead populations of particles to diffuse into regions of low posterior probability. 
This, we feel is a powerful argument against quant ile-based adaptive schemes, in particular 
those were the computational cost is fixed, i.e. where the number of simulations is specified and 
a fixed fraction of the simulated particles with the smallest distances are used to make up the 
intermediate population. The attraction of fixed (or controlled) computational burden comes 
with the high risk of convergence to spurious and biased "posterior" distributions. 
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Here we have shown how the UT can be employed to predict the shape of the threshold - 
acceptance rate curve, and select a (near-) optimal threshold value. This approach is superior 
to the selection of a fixed quantile criterion for the threshold choice, as the optimal quantile 
depends critically on the problem at hand. The UT is an ancillary or supporting statistical 
inference step which allows us to fine-tune the technical parameters of the inference process; in 
no way does this interfere with conventional Bayesian practice or conventions. It merely allows 
us to overcome some of the limitations inherent to the ABC approach (where indicator functions 
replace continuous probability measures). 

We have here focussed on dynamical systems where the observed data are compared directly 
to the simulations, rather than summary statistics of real and observed data. Extension of this 



approach to inference using appropriate summaries 28 - 30 is in principle straightforward as the 



UT only aims to predict the shape of the threshold - acceptance rate curve. The remaining 
parts of the algorithm are not affected by this in principle, although in practice the Gaussian 
mixture model and other factors affecting efficiency and accuracy of the predictions may need 
to be considered carefully. 

It has to be kept in mind that at the moment we adopt a greedy procedure and predict 
only the next threshold. Providing a global choice of the threshold for all t is a much harder 
inference task. Although the overall number of simulations in the ABC SMC scheme (once all 
are determined) may be less than the number of simulations required by our greedy approach, 
we believe that the computational burden and complications inherent in determining global 
schedules are prohibitive. 
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